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Abstract 

Several  models  of  probabilistic  systems  comprise  both  probabilistic  and  nondeterministic 
choice.  In  such  models,  the  resolution  of  nondeterministic  choices  is  mediated  by  the  concept 
of  policies  (sometimes  called  adversaries).  A  policy  is  a  criterion  for  choosing  among  nonde¬ 
terministic  alternatives  on  the  basis  of  the  past  sequence  of  states  of  the  system.  By  fixing 
the  resolution  of  nondeterministic  choice,  a  policy  reduces  the  system  to  an  ordinary  stochastic 
system,  thus  making  it  possible  to  reason  about  the  probability  of  events  of  interest. 

A  partial  information  policy  is  a  policy  that  can  observe  only  a  portion  of  the  system  state, 
and  that  must  base  its  choices  on  finite  sequences  of  such  partial  observations.  We  argue  that 
in  order  to  obtain  accurate  estimates  of  the  worst-case  performance  of  a  probabilistic  system, 
it  would  often  be  desirable  to  consider  partial-information  policies.  However,  we  show  that 
even  when  considering  memoryless  partial-information  policies,  the  problem  of  deciding  whether 
the  system  can  stay  forever  with  positive  probability  in  a  given  subset  of  states  becomes  NP- 
complete.  As  a  consequence,  many  verification  problems  that  can  be  solved  in  polynomial 
time  under  perfect-information  policies,  such  as  the  model-checking  of  pCTL  or  the  computa¬ 
tion  of  the  worst-case  long-run  average  outcome  of  tasks,  become  NP-hard  under  memoryless 
partial-information  policies.  On  the  positive  side,  we  show  that  the  worst-case  long-run  average 
outcome  of  tasks  under  under  memoryless  partial-information  policies  can  be  computed  by  solv¬ 
ing  a  nonlinear  programming  problem,  opening  the  way  to  the  use  of  numerical  approximation 
algorithms. 


1  Introduction 

In  several  models  of  probabilistic  systems,  probabilistic  and  nondeterministic  choice  coexist.  While 
probabilistic  choice  provides  a  statistical  characterization  of  the  system  behavior,  nondeterminism 
is  used  to  model  concurrency  [Var85,  PZ86,  SL94],  and  lack  of  knowledge  of  transition  probabilities 
[Seg95,  dA97]  and  transition  rates  [dA98b].  In  such  a  model,  the  probability  of  events  depends  on 
the  way  the  nondeterministic  choices  are  resolved  during  the  behavior  of  the  system.  To  assign  a 

*This  paper  appeared  in  the  Proceedings  of  the  Workshop  on  Probabilistic  Methods  in  Verification,  published  as 
Technical  Report  CSR-99-9,  pages  19-32,  University  of  Birmingham,  1999.  This  research  was  supported  in  part  by 
the  NSF  CAREER  award  CCR-9501708,  by  the  DARPA  (NASA  Ames)  grant  NAG2-1214,  by  the  DARPA  (Wright- 
Patterson  AFB)  grant  F33615-98-C-3614,  by  the  ARO  MURI  grant  DAAH-04-96- 1-0341,  and  by  the  Gigascale  Silicon 
Research  Center. 
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probability  to  the  events,  it  is  customary  to  use  the  notion  of  policy  [Bel57],  closely  related  to  the 
schedulers  of  [Var85]  and  the  adversaries  of  [SL94] .  Whenever  the  choice  among  nondeterminis- 
tic  alternatives  arises,  a  policy  dictates  the  probability  of  choosing  each  alternative,  possibly  as  a 
function  of  the  past  sequence  of  states  visited  by  the  system.  Hence,  once  the  policy  is  specified, 
the  nondeterminism  present  in  the  system  is  resolved,  and  the  system  is  thus  reduced  to  a  purely 
probabilistic  system.  In  the  statement  of  verification  problems,  nondeterminism  is  usually  assigned 
a  demonic  role:  a  property  is  considered  to  hold  iff  it  holds  under  any  possible  resolution  of  nonde- 
terministic  choice,  or  equivalently,  under  any  policy.  Perfect-information  and  partial-information 
policies  correspond  to  demons  with  different  observation  powers.  A  perfect-information  policy  is  a 
policy  that  can  observe  the  complete  description  of  the  system  state,  and  that  can  select  among  the 
nondeterministic  alternatives  on  the  basis  of  the  finite  sequence  of  states  traversed  by  the  system. 
In  contrast,  a  partial-information  policy  can  only  observe  part  of  the  system  state,  and  it  must 
select  among  the  alternatives  on  the  basis  of  finite  sequences  of  such  incomplete  observations. 

To  understand  why  partial-information  policies  can  lead  to  more  accurate  estimates  of  the 
worst-case  system  performance,  consider  the  following  example.  Consider  a  telecommunication 
network  that  routes  phone  calls  between  nodes,  and  consider  two  users  u\  and  U2,  attached  to  two 
nodes  of  the  network.  When  user  u\  tries  to  call  user  it  2 ,  he  is  either  connected  to  user  it  2 ,  or 
he  receives  a  busy  signal,  indicating  that  there  are  no  connections  available  to  route  the  call.  We 
intend  to  model  the  system  in  order  to  study  the  long-run  average  fraction  of  successful  calls.  To 
simplify  the  example,  we  assume  that  we  have  enough  statistical  information  about  the  network 
to  model  the  number  of  connections  available  between  any  pair  of  nodes  as  a  purely  probabilistic 
process,  without  any  nondeterminism.  On  the  other  hand,  we  do  not  have  precise  information  on 
when  our  particular  user  u\  wishes  to  call  U2,  so  that  we  model  the  choice  to  place  a  call  as  a 
nondeterministic  choice.  From  the  point  of  u±,  a  state  s  of  the  system  consists  of  four  components 
s  =  (s[c],s[«i],s[r],s[n]),  where: 

•  s[c]  is  a  portion  of  the  state  visible  to  everyone  (e.g.,  the  current  time  of  the  day); 

•  s[«i]  G  {idle,  trying,  connected}  describes  the  state  of  up, 

•  s[n]  G  {0,...,IV}  is  the  number  of  available  connections  for  calls  between  u\  and  U2',  if 
s[n]  =  0,  then  no  call  from  u\  to  U2  can  take  place. 

•  s[t]  is  a  portion  of  the  state  visible  only  to  the  network  (e.g.,  the  state  of  other  communication 
links  and  routing  tables). 

If  spiti]  =  idle,  there  is  a  nondeterministic  choice  between  staying  at  idle  (i.e.,  not  placing  a  call), 
or  going  to  trying  (i.e.,  dialing  the  number  and  waiting  for  the  connection  to  be  established).  If 
s[iti]  =  trying  and  s[n]  >  0,  then  the  connection  is  established,  and  u\  proceeds  to  connected ;  if 
s[iti]  =  trying  and  s[n]  =  0,  user  u\  gets  a  busy  signal,  and  returns  to  idle.  If  s[iti]  =  connected, 
user  u\  can  either  remain  in  this  state,  or  hang  up  and  proceed  to  idle.  In  Section  3  we  present 
the  formal  model  of  a  system  similar  to  the  one  described  above. 

If  we  model  the  communication  system  as  indicated  above,  and  study  the  system  under  perfect- 
information  policies,  we  obtain  that  in  the  worst  case  the  long-run  fraction  of  successful  calls  of  is  0 
—  independently  of  how  many  free  connections  there  are  on  average  between  u\  and  U2 !  In  fact, 
at  states  where  s[iti]  =  idle  the  choice  of  whether  to  stay  at  idle  or  try  to  place  a  call  (going  to 
connected )  is  nondeterministic.  In  such  a  state,  a  perfect-information  policy  can  look  at  the  value 
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of  s [n]  before  deciding  whether  to  place  a  call.  Hence,  a  worst-case  perfect-information  policy  will 
place  a  call  only  from  states  where  s[n]  =  0,  i.e.,  where  all  the  connections  between  u\  and  U2  are 
busy.  While  the  value  0  is  indeed  a  lower  bound  for  the  long-run  average  fraction  of  successful 
calls  from  u\  to  U2,  this  answer  takes  an  unrealistic,  and  overly  pessimistic,  view  of  the  system.  In 
fact,  the  use  of  nondeterminism  to  model  the  decision  of  u\  to  place  a  call  is  intended  to  model 
an  unknown  dependency  between  the  frequency  with  which  u\  places  calls,  and  global  information 
such  as  the  time  of  the  day.  It  is  unrealistic  to  assume  that  u\  can  base  its  decision  to  place  a 
call  on  the  number  of  free  connections,  since  such  information  would  not  be  available  to  u\  in  a 
real  telecommunication  system.  In  order  to  obtain  a  more  realistic  worst-case  analysis,  we  need  to 
consider  partial-information  policies,  which  can  base  their  decision  of  whether  to  place  a  call  on 
the  state  of  u\  and  on  global  information  in  s[c],  but  not  on  information  that  is  internal  to  the 
network,  such  as  the  number  of  free  connections  between  u\  and  U2- 

The  telecommunication  example  also  suggests  why  the  need  for  partial-information  policies  is 
more  felt  in  the  analysis  of  probabilistic  systems  than  in  the  analysis  of  purely  nondeterministic 
ones.  In  a  purely  nondeterministic  system,  we  are  generally  interested  in  the  possibility  of  events, 
rather  than  in  their  frequency.  Hence,  all  finite  sequence  of  events,  however  rare  they  might  be, 
are  taken  into  account  for  establishing  a  property.  In  purely  nondeterministic  systems,  the  concept 
of  fairness  is  normally  used  instead  of  partial  information  to  guard  against  an  infinite  number  of 
unfortunate  coincidences,  such  as  trying  to  place  a  call  always  only  when  no  connection  is  free.  In 
the  above  example,  we  can  rule  out  such  behaviors  by  adding  a  fairness  condition  to  the  system, 
requiring  that  the  choice  between  placing  a  call  and  staying  in  idle  should  be  (strongly)  fair  at 
all  states.  Even  though  fairness  and  partial  information  are  not  equivalent,  fairness  is  preferred 
because  it  is  amenable  to  simpler  verification  methods.  However,  fairness  is  not  a  substitute  for 
partial  information  in  the  study  of  the  frequency  of  events.  For  example,  the  above  fairness  condition 
on  placing  calls  does  not  ensure  that  the  frequency  of  placing  calls  is  not  influenced  by  the  number 
of  free  connections.  In  fact,  even  with  this  fairness  condition,  the  worst-case  long-run  average 
fraction  of  calls  that  are  successful  is  arbitrarily  close  to  0:  the  fairness  condition  is  still  satisfied 
if  a  fraction  of  the  calls  smaller  than  1,  but  arbitrarily  close  to  1,  is  placed  from  states  where  no 
connection  is  free.  We  note  that  fairness  in  probabilistic  systems  can  be  used  as  a  surrogate  for 
partial  information  when  the  specification  languages  cannot  refer  to  the  frequency  of  events,  as  is 
the  case  for  the  logic  pCTL  [BK98,  dA99]. 

Our  model  for  systems  with  both  probabilistic  and  nondeterministic  choice  is  that  of  Markov  de¬ 
cision  processes  [Bel57,  Der70],  which  are  closely  related  to  several  models  proposed  in  the  literature 
[Var85,  PZ86,  SL94].  We  consider  the  confinement  problem,  consisting  in  deciding  whether  there  is 
a  policy  in  a  given  class  of  policies  that  enables  us  to  stay  forever  in  a  specified  subset  of  states  with 
probability  greater  than  0.  While  the  confinement  problem  is  solvable  in  polynomial-time  for  gen¬ 
eral  policies,  from  [Rei84]  we  have  that  the  confinement  problem  is  EXPTIME-complete  for  partial- 
information  policies.  We  show  that  the  confinement  problem  is  NP-complete  even  if  we  restrict 
our  attention  to  memoryless  and  limit-memoryless  partial-information  policies.  Linrit-memoryless 
partial-information  are  policies  whose  state-action  frequencies  converges  to  that  of  a  memoryless 
partial-information  policy.  These  results  imply  that  the  model-checking  problem  for  pCTL  specifi¬ 
cations,  and  the  problem  of  computing  the  worst-case  long-run  average  outcome  of  tasks  [dA98a] , 
which  can  be  solved  in  polynomial  time  under  perfect-information  policies,  are  EXPTIME-hard 
under  partial-information  policies,  and  NP-hard  under  memory  less  and  limit- memory  less  partial- 
information  policies.  On  the  positive  side,  we  show  that  the  worst-case  long-run  average  outcome 
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of  tasks  under  nrenroryless  and  limit-memoryless  partial-information  policies  can  be  computed  by 
solving  a  nonlinear  optimization  problem. 

The  paper  is  organized  as  follows.  In  Section  2  we  describe  Markov  decision  processes  and 
partial-information  policies,  and  we  define  the  confinement  problem.  In  Section  3  we  present  the 
machinery  for  defining  the  long-run  average  outcome  of  tasks,  and  we  describe  a  simple  telecom¬ 
munication  example  that  helps  to  motivate  the  consideration  of  partial-information  policies.  In 
Section  4  we  present  lower-bound  results  on  the  complexity  of  pCTL  model  checking  and  compu¬ 
tation  of  long-run  average  outcomes  under  partial  information.  Section  5  presents  the  optimization 
problem  that  enables  the  computation  of  the  worst-case  long-run  average  outcome  of  tasks  under 
nrenroryless  partial-information  policies,  and  Section  6  contains  some  concluding  comments. 

2  Markov  Decision  Processes  and  Partial-Information  Policies 

Our  model  for  probabilistic  systems  is  a  Markov  decision  process  (MDP).  An  MDP  is  a  general¬ 
ization  of  a  Markov  chain  in  which  a  set  of  possible  actions  is  associated  with  each  state.  To  each 
state-action  pair  corresponds  a  probability  distribution  on  the  states,  which  is  used  to  select  the 
successor  state  [Der70].  Markov  decision  processes  are  closely  related  to  the  probabilistic  automata 
of  [Rab63],  the  concurrent  Markov  chains  of  [Var85],  and  the  simple  probabilistic  automata  of 
[SL94,  Seg95].  Given  a  countable  set  C  we  denote  by  T>(C)  the  set  of  probability  distributions  over 
C,  i.e.  the  set  of  functions  f  :  C  i — >•  [0, 1]  such  that  Ylxecf(x)  =  1-  An  MDP  V  =  (S,  Acts,  A,p) 
consists  of  the  following  components: 

•  A  set  S  of  states. 

•  A  set  Acts  of  actions. 

•  A  function  A  :  S  i— >  2Acts,  which  associates  with  each  s£S  a  finite  set  A{s)  C  Acts  of  actions 
available  at  s. 

•  A  function  p  :  S  x  Acts  i— >  T>(S),  which  associates  with  each  s,t  £  S  and  a  £  A(s)  the 

probability  p(s,  a)(t)  of  a  transition  from  s  to  t  when  action  a  is  selected. 

We  measure  the  complexity  of  the  algorithms  as  a  function  of  the  size  of  the  MDP  V,  defines 
as  Z)seS  l-^(s)l-  A  path  of  an  MDP  is  an  infinite  sequence  -so,  ao,  si,  ai, . . .  of  alternating  states 
and  actions,  such  that  st  £  S,  a*  £  A(sj)  and  p(si,  aj)(s*+i)  >  0  for  all  i  >  0.  For  i  >  0,  the 
sequence  is  constructed  by  iterating  a  two-phase  selection  process.  First,  an  action  a%  £  A(s*)  is 
selected  nondeterministically;  second,  the  successor  state  Sj+i  is  chosen  according  to  the  probability 
distribution  p(si,a).  Given  a  path  so,  ao,  si,  oi, . . .  and  k  >  0,  we  denote  by  X^,  Y its  generic 

fc-th  state  Sk  and  its  generic  k- th  action  a*,,  respectively.  For  n  >  0,  we  call  a  finite  portion 

so,  ao,  si, . . . ,  sn  of  path  a  finite  path  prefix. 

Let  S+  be  the  set  of  non-empty  finite  sequences  of  states.  A  (perfect-information)  policy  it  is 
a  mapping  n  :  S+  V(Acts),  which  associates  with  each  sequence  of  states  s  :  so,  si, . . . ,  sn  £  S+ 
and  each  a  £  A(sn)  the  probability  7r(s)(a)  of  choosing  a  after  following  the  sequence  of  states 
s.  We  require  that  7r(s)(a)  >  0  implies  a  £  A(sn):  a  policy  can  choose  only  among  the  actions 
that  are  available  at  the  state  where  the  choice  is  made.  According  to  this  definition,  policies  are 
randomized,  differently  from  the  schedulers  of  [Var85,  PZ86],  which  are  deterministic.  We  indicate 
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with  II  the  set  of  all  policies.  We  say  that  a  policy  n  is  memoryless  if  ir(s,  s)  =  ir(t,  s)  for  all 
s,t  G  S*  and  all  s  G  S. 

To  define  partial-information  policies,  we  define  partial-information  relations.  A  partial-infor¬ 
mation  relation  for  an  MDP  V  =  (S,  Acts,  A,p)  is  an  equivalence  relation  ~  C  S  x  S  such  that 
for  all  s  ~  t,  we  have  A(s)  =  Aft).  If  two  states  are  related  by  ~,  then  the  states  cannot  be 
distinguished  by  a  partial-information  policy.  The  condition  on  ~  ensures  that  if  two  states  cannot 
be  distinguished  by  the  policy,  then  the  policy  can  choose  among  the  same  actions  at  the  two  states. 
Given  two  sequences  of  states  s  :  sq,  . . .  ,sn  and  t  :  to, ,  tm,  with  m,n  >  0,  we  write  s  ~  t  iff 
m  =  n  and  st  ~  t{  for  all  1  <  i  <  n.  Given  an  MDP  V  and  a  partial-information  relation  ~  for  V, 
we  say  that  a  policy  ir  is  partial-information  iff  tt(s)  =  it (t)  for  all  s,  t  E  S+  such  that  s  ~  t.  If  the 
relation  ~  has  been  fixed,  we  denote  by  PIPol  the  set  of  partial-information  policies  with  respect 
to  ~. 

Once  a  policy  ir  has  been  selected,  the  Markov  decision  process  is  reduced  to  a  purely  prob¬ 
abilistic  process,  and  it  becomes  possible  to  define  the  probabilities  of  events.  In  particular,  the 
probability  of  following  a  finite  path  prefix  so,  ao,  si,  ai, . . . ,  sn  under  policy  ir  G  II  is  given  by 

n—  1 

Pr^0(A0  =  so  A  T0  =  ao  A  •  ■  ■  A  Xn  =  sn)  =  p(so  a*)(s*+ 1)  tt(so,  •  •  • ,  • 

i= o 

To  extend  this  probability  measure  to  subsets  of  infinite  paths,  for  every  state  s  £  S  we  denote 
by  ®s  the  set  of  (infinite)  paths  having  s  as  initial  state.  Given  two  paths  (or  path  prefixes)  6\ 
and  62,  we  denote  by  6\  A  9-2  the  fact  that  9\  is  a  prefix  of  62.  Following  the  classical  definition 
of  [KSK66],  we  let  Bs  C  20s  be  the  cr-algebra  of  measurable  subsets  of  ©s,  defined  as  the  smallest 
algebra  that  contains  all  the  cylinder  sets  {9  G  ©s  |  a  A  9},  for  a  that  ranges  over  all  finite  path 
prefixes,  and  that  is  closed  under  complementation  and  countable  unions  (and  hence  also  countable 
intersections).  The  elements  of  Bs  are  called  events,  and  they  are  the  measurable  sets  of  paths  to 
which  we  will  associate  a  probability.  For  A  G  UseS1®^  we  W1'he  PrJ(M)  to  denote  the  probability 
of  event  A  D  Bs  starting  from  the  initial  state  s  G  5  under  policy  7 r,  and  we  write  {/}  to  denote 
the  expectation  of  the  random  function  /  :  Qs  1— >  IR  from  initial  state  s  under  policy  7 r. 

The  Confinement  Problem 

Given  an  MDP  V  =  (S,  Acts,  A,p),  a  subset  U  C  S,  a  state  s  G  S,  and  a  class  of  policies  C,  the 
confinement  problem  consists  in  determining  whether  there  is  a  policy  7r  G  C  such  that 

Pr^(Vfc  >  0.  Xk  G  U)  >  0  .  (1) 

It  is  known  that  for  C  =  II  the  confinement  problem  can  be  solved  in  polynomial-time  with  efficient 
graph  algorithms.  We  shall  study  the  complexity  of  this  problem  for  partial- information  policies. 
We  note  that  the  confinement  problem  is  at  the  heart  of  several  algorithms  for  the  mo  del- checking 
of  pCTL*  specifications  [CY95,  BdA95,  BK98];  hence,  the  complexity  of  the  confinement  problem 
directly  affects  the  complexities  of  these  model-checking  problems. 

3  Long-Run  Average  Outcome 

The  long-run  average  properties  considered  in  [dA98a,  dA98b]  refer  to  the  average  outcome  of  a 
task,  which  is  repeated  infinitely  often  during  the  behavior  of  the  system.  During  a  task,  a  certain 
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amount  of  outcome  is  accrued,  indicating  for  instance  the  time  required  to  complete  the  task,  or 
the  successful  or  unsuccessful  completion  of  the  task.  In  our  telecommunication  example,  a  task 
consists  in  trying  to  place  a  call;  the  outcome  accrued  is  1  if  the  call  succeeds,  and  0  if  no  connection 
is  available.  The  long-run  average  outcome  of  this  task  is  equal  to  the  long-run  average  fraction 
of  successful  calls.  Given  an  MDP  V  =  (S ,  Acts ,  A,  p) ,  we  specify  tasks  and  outcomes  using  two 
labelings  r  and  w.  The  labeling  w  :  S  x  Acts  e- >  {0, 1}  associates  with  each  s  G  S  and  each  a  G  A(s) 
the  value  1  if  taking  action  o  at  s  signals  the  completion  of  a  task,  and  value  0  otherwise.  The 
labeling  r  :  S  x  Acts  1R  associates  an  outcome  to  each  state-action  pair.  We  say  that  a  policy  n 
is  proper  from  s  G  S  such  that 

n—  1 

Jim  E^{5>(xfc,yfc)}  =  oo  , 

k= 0 

indicating  that  the  system  performs  an  infinite  expected  number  of  experiments  from  s.  We  denote 
by  PropPol(s)  C  II  the  set  of  proper  policies  from  s.  Given  s  G  S  and  a  proper  policy  it  G 
PropPol(s),  we  define  the  long-run  average  outcome  v J  of  7 r  from  s  by 


v 


7T 

S 


,  e;{ekJ  >•(**.  n)} 

=  lim  mi - ? - . 


Given  a  class  of  policies  C,  let  PropS (C)  =  {s  G  S  \  PropPol(s)  IlC  /  0}  be  the  set  of  states  with  at 
least  one  proper  policy  belonging  to  the  class.  The  minimum  long-run  average  outcome  problem 
consists  in  computing 

vT,  =  min  \  inf  tT  \ 
sePropS(C)  ^nEiCnPropPol^s)  ' 

assuming  that  PropS (C)  ^  0.  If  C  =  II,  this  problem  can  be  solved  in  polynomial  time  by  a 
reduction  to  linear  programming  [dA97,  dA98a]. 


A  Simple  Telecommunication  Example 

The  following  example  presents  a  discrete-time  model  of  a  telecommunication  system  similar  to 
the  one  discussed  in  the  introduction.  The  example  illustrates  the  use  of  nondeterminism  for  the 
representation  of  approximate  knowledge  of  transition  probabilities.  Consider  a  telecommunica¬ 
tion  network,  in  which  there  is  a  total  number  n  >  0  of  connections  available,  and  there  is  a 
distinguished  user  u\  that  tries  intermittently  to  place  calls.  We  model  this  system  by  the  MDP 
V  =  (S,  Acts,  A,p),  where  S  =  {idle,  trying,  connected }  x  {0, . . . ,  n}  x  {0, 1}.  In  a  state  (x,  k,  i)  G  S, 
x  is  the  state  of  the  user  u i ,  k  is  the  number  of  busy  connections,  and  i  specifies  whether  it  is  ill’s 
turn  (i  =  0),  or  the  network’s  turn  (i  =  1)  of  updating  the  state.  The  actions  are  Acts  =  {a,b}, 
and  we  have  A(s)  =  {a,  b}  for  every  s  G  S. 

The  number  of  free  connections  performs  a  random  walk  between  0  and  n.  From  state  ( x ,  k,  1), 
under  either  action  a  or  b,  we  update  the  state  as  follows: 

•  If  2  <  k  <  n,  then  we  go  with  probability  1/2  to  (x,  k  —  1,  0)  and  with  probability  1/2  to 
(x,  k+  1,  0). 

•  If  k  =  1  and  x  /  connected,  then  we  go  with  probability  1/2  to  (x,k  —  1,0)  and  with 
probability  1/2  to  (x,k  +  1,0). 
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•  If  k  =  1  and  x  =  connected ,  then  we  go  to  (x,  k  +  1, 0). 

•  If  k  =  0,  then  we  go  to  (x,k  +  1,0). 

•  If  k  =  n,  then  we  go  to  (a;,  k  —  1, 0). 

From  state  (idle,  k,  0),  if  the  action  a  is  chosen  we  proceed  to  state  (idle,  k,  1);  if  action  b  is  chosen, 
we  proceed  to  state  (trying,  k,  0).  From  state  (trying,  k,  0),  under  both  actions  a  and  b  we  proceed 
to  state  (connected,  k  +  1,1)  if  k  <  n ,  and  to  state  (idle,k,  1)  if  k  =  n.  Finally,  from  state 
(connected,  k,  0)  under  both  actions  a  and  b  we  proceed  to  state  (idle,  k  —  1,1):  for  simplicity,  we 
consider  only  unit-duration  phone  calls. 

To  measure  the  long-run  average  fraction  of  successful  calls,  we  define  the  labels  r  and  w  as 
follows.  We  let  w ((trying,  &,0},£)  =  1  for  all  0  <  k  <  n  and  all  £  G  {a,  b},  so  that  w  counts 
the  number  of  attempted  calls.  We  let  r ((trying,  n,  0),£)  =  0  and  r((trying,k,  0),£)  =  1  for  all 
0  <  k  <  n  and  all  £  G  {a,  6},  so  that  r  counts  the  number  of  successful  calls.  It  is  easy  to  check  that 
is  equal  to  the  fraction  of  successful  calls  from  the  initial  state  s  under  policy  it.  The  worst-case 
value  of  this  fraction  under  perfect-information  policies  is  =  0.  This  worst-case  value  arises 
when  the  user  u\  chooses  action  b  whenever  there  are  no  free  connections,  and  action  a  otherwise. 
This  is  clearly  an  unrealistic  worst-case  value.  A  better  estimate  can  be  obtained  by  introducing 
a  partial-information  relation  ~  defined  by  (i,  k\ ,  j)  ~  (i,  k‘2,j)  for  all  i  G  {idle,  trying,  connected }, 
all  0  <  k\,k2  <  n,  and  all  j  =  0,1.  This  partial-information  relation  prevents  the  user  u\  from 
selecting  actions  a  and  b  on  the  basis  of  the  number  of  free  connections.  The  worst-case  fraction  of 
successful  calls  under  partial-information  policies  VpIPol  provides  a  more  realistic  estimate  of  the 
performance  of  the  system. 

We  note  that  introducing  a  partial-visibility  relation  is  equivalent  to  assuming  that  there  are 
no  factors  external  to  the  model  that  can  influence  the  policies  differently  at  states  related  by  the 
partial-visibility  relation.  While  u\  most  likely  cannot  base  his  decision  of  calling  on  the  number 
of  free  connections,  there  might  be  external  factors  that  make  it  more  likely  for  u\  to  call  when 
more  connections  are  busy.  For  example,  in  countries  where  soccer  is  popular,  more  people  place 
telephone  calls  during  the  mid-game  intervals  than  during  the  game  proper.  If  such  external 
factors  are  not  accounted  for,  then  the  worst-case  long-run  fraction  of  successful  calls  computed 
under  partial-information  policies  is  an  optimistic  estimate  of  the  true  worst-case  long-run  average 
fraction.  Hence,  adding  partial-information  restrictions  to  the  policies  should  be  done  on  the  basis 
of  a  careful  examination  of  the  model. 

In  alternative  to  using  partial  information,  we  can  increase  the  accuracy  of  the  worst-case 
estimates  of  the  fraction  of  successful  calls  by  reducing  the  role  of  nondeterminism  and  providing 
more  probabilistic  information  about  the  user’s  behavior.  Specifically,  suppose  that  we  know  that 
the  probability  that  the  user  will  place  a  call  when  idle  is  between  0.1  and  0.2.  To  represent  this 
range,  we  modify  the  above  model  as  follows.  From  state  (idle,k,  1)  action  a  (resp.  b)  leads  to 
(idle,k,  1)  with  probability  0.9  (resp.  0.8),  and  to  (trying,  k,0)  with  probability  0.1  (resp.  0.2).  In 
this  model,  may  provide  a  realistic  value  for  the  fraction  of  successful  calls,  and  the  resolution 
of  the  remaining  nondeterminism  under  perfect  information  can  account  for  correlations  of  events 
not  described  by  the  model,  such  as  the  above-mentioned  soccer-ganre  phenomenon. 
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4  Complexity  of  Partial-Information  Confinement 


In  this  section,  we  present  the  complexity  results  for  the  confinement  problem  under  partial- 
information  policies.  By  reasoning  as  in  [Rei84],  it  can  be  shown  that  the  confinement  problem 
is  EXPTIME  complete  for  partial-information  policies.  Moreover,  we  show  that  the  confinement 
problem  is  NP-complete  for  memory  less  and  limit-memory  less  partial-information  policies. 

4.1  General  Partial-Information  Policies 

The  following  theorem  states  our  result  for  general  partial-information  policies. 

Theorem  1  The  confinement  problem  is  EXPTIME-complete  for  the  class  of  partial-information 
policies. 

Proof.  The  fact  that  the  problem  is  in  EXPTIME  follows  from  the  subset  construction  of  [Rei84] . 
The  lower  bound  follows  by  reasoning  as  in  [Rei84]  for  “blindfold  games” ,  repeating  infinitely  many 
times  the  simulation  of  the  non  deterministic  Turing  machine.  ■ 

Corollary  1  The  problems  of  pCTL  model  checking  and  of  the  computation  of  the  minimum 
long-run  average  outcome  are  EXPTIME-hard  for  incomplete-information  policies. 

Proof.  The  result  about  pCTL  model  checking  follows  directly  from  Theorem  1  by  considering 
a  property  of  the  form  aU,  where  by  abuse  of  notation  we  denote  by  U  both  a  subset  of  states, 
and  a  predicate  defining  such  subset.  The  result  about  the  minimum  long-run  average  outcome 
follows  from  Theorem  1  by  considering  an  MDP  in  which  the  set  U,  once  left,  cannot  be  re-entered, 
together  with  a  function  w  identically  equal  to  1,  and  a  function  r  defined  by  r(s,  a)  =  1  if  s  £  U 
and  r(s,  a)  =  0  if  s  fL  U,  for  all  states  s  and  actions  a.  I 

4.2  Memoryless  and  Limit-Memoryless  Partial-Information  Policies 

A  memoryless  partial-information  policy  is  a  policy  that  is  both  nrenroryless  and  partial  information. 
We  denote  by  II mp  the  class  of  nrenroryless  partial-information  policies.  To  define  linrit-nrenroryless 
partial-information  policies,  for  all  s  £  S  and  a  £  A(s)  we  denote  by 

71—1 

N?ta  =  Y,&(Xk  =  sAYk  =  a) 

k= 0 

the  random  variable  indicating  the  number  of  times  that  the  state- action  pair  s,a  appears  in  the 
first  n  steps  of  a  path.  A  frequency-stable  policy  is  a  policy  ir  such  that  the  limit 

xf{s,  a)  =  linr  -E?{N™  }  (2) 

n — xx)  77,  ’ 

exists  for  all  t,s  £  S  and  a  £  A(s).  The  quantity  xj{s,  a)  is  the  frequency  of  state-action  pair  s,  a 
from  the  initial  state  t  under  policy  it.  A  policy  n  is  limit-memoryless  partial  information  if  it  is 
frequency  stable,  and  if  for  all  states  s,t,u  £  S  with  t  ~  u,  one  of  the  following  condition  holds: 


1.  either  xf{t,  a)  =  0  for  all  a  £  A(t); 


2.  or  Xg(u,a)  =  0  for  all  a  £  A(u); 

3.  or,  for  all  a  £  A(t), 


(3) 


Xs(t,a)  =  %s(u,  a) 

EbeA(t)xs(t,b)  J2beA(u)  xs(ui  b)  ' 

Together,  these  conditions  state  that  each  action  is  chosen  with  the  same  relative  frequency  at  s 
and  at  t,  unless  one  of  s  or  t  has  0  frequencies  for  all  the  actions.  We  denote  by  II lmp  the  class 
of  limit-memoryless  partial-information  policies.  We  note  that  a  memoryless  partial-information 
policy  is  also  a  limit-memoryless  partial-information  policy.  Intuitively,  a  limit-memoryless  partial- 
information  policy  is  a  policy  that  can  initially  behave  in  an  arbitrary  way,  but  that  on  the  long  run 
gives  rise  to  state-action  frequencies  that  correspond  to  those  of  a  nrenroryless  partial-information 
policy. 

Theorem  2  The  confinement  problem  for  the  classes  of  memoryless  and  limit-memoryless  policies 
is  NP-complete. 

Proof.  To  see  that  the  problems  are  in  NP,  note  that  it  suffices  to  guess  a  deterministic  rnerno- 
ryless  partial-information  policy,  and  check  that  it  satisfies  (1).  The  proof  of  NP-hardness  is  by  a 
reduction  from  the  SAT  problem.  Consider  an  instance  of  SAT  problem  defined  over  a  finite  set 

Y  =  {yi , . . . ,  yk}  of  variables.  Let  L  =  YU{y\y£Y}  be  the  set  of  literals,  and  let  ci, . . . ,  cn  C  L 
be  the  clauses  composing  the  problem.  The  SAT  problem  consists  in  checking  whether  the  proposi¬ 
tional  formula  A?:=i  V?ec,  ^  is  satisfiable.  From  this  instance  of  SAT  problem,  we  construct  an  MDP 

V  =  (S,  Acts,  A,p)  and  a  partial  information  relation  ~  as  follows.  The  state  space  is 

S  =  {s0,  si}  U  {1, . .  ,,k  +  1}  x  {1, . .  .,n}  X  {0,1}  . 

We  let  U  =  S  \  {s0},  and  we  take  si  as  the  initial  state.  A  state  of  the  form  (m,i,j)  refers  to  the 
occurrence  of  variable  ym  in  clause  Ci  (where  yk+i  is  a  dummy  variable).  The  component  j  of  the 
state  keeps  track  of  whether  clause  c*  has  already  been  satisfies  by  the  variable  assignment  ( j  =  1)  or 
not  (j  =  0).  The  set  of  actions  is  Acts  =  {a,b}.  We  have  A((m,i,  j))  =  {a,b}  for  all  1  <  m  <  k  + 1, 
all  1  <  i  <  n,  and  all  j  =  0, 1.  Choosing  action  a  (resp.  b)  at  state  (m,i,j)  corresponds  to  choosing 
the  truth  value  true  (resp.  false )  for  ym  in  clause  c*.  We  let  A(so)  =  A(si)  =  {a}.  The  transitions 
are  as  follows.  We  let  p(so,«)(so)  =  1>  so  that  so  is  absorbing,  and  we  let  p(s\,  a)((l,  i,  0))  =  1/n, 
for  1  <  i  <  n.  For  all  1  <  i  <  n  and  £  £  {a,  6},  we  let 

p((k  +  l,i,l),Q(si)  =  1  p((k+  M,0),£)(so)  =  1 

so  that  if  the  clause  i  has  been  satisfied,  we  go  back  to  si,  and  we  proceed  to  -so  otherwise.  For 

1  <  m  <  k,  1  <  i  <  n,  and  j  =  0, 1,  the  transitions  from  the  other  states  are  defined  as  follows: 

•  From  (m,  i ,  1),  both  a  and  b  lead  deterministically  to  (m  +  1,  i,  1). 

•  From  (m,i,  0),  we  have  three  cases: 

—  If  yrn  £  Ci,  then  a  leads  deterministically  to  (m  +  1,  i,  1)  and  b  to  (m  +  1,  i,  0). 

—  If  ym  £  Ci,  then  a  leads  deterministically  to  (m  +  1,  i,  0)  and  b  to  (m  +  1,  i,  1). 
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—  If  ym  ^  Ci  and  ym  fL  Ci,  then  both  a  and  b  lead  deterministically  to  (m  +  1  ,i,  0). 

Finally,  the  partial  information  relation  is  defined  by  ~  (m,i 2^2)  for  all  1  <  m  <  k  +  1, 

all  1  <  ii,i2  <  n,  and  all  ji,j2  £  {0, 1}.  The  states  so  and  si  are  equivalent  only  to  themselves. 

The  idea  of  the  construction  is  as  follows.  From  si,  the  process  proceeds  uniformly  at  random  to 
a  state  of  the  form  (1,  i,  0),  for  1  <  i  <  n.  The  following  k  choices  between  actions  a  and  b  correspond 
to  the  choice  of  a  truth  assignment  for  variables  y\. ,  y^.  If  the  truth  assignment  satisfies  the 
clause  (y,  the  process  goes  to  ( k  +  1  ,i,  1);  otherwise,  it  goes  to  (k  +  l,i,0).  From  (k  +  1  ,i,  1),  the 
process  goes  back  to  si,  and  it  selects  randomly  another  clause  to  test.  From  (k  +  l,z,  0),  which 
indicates  that  clause  Cj  has  not  been  satisfies,  we  go  to  so  0  U,  which  indicates  failure.  Since  the 
policy  does  not  know  which  one  of  the  clauses  ci, . . . ,  cn  is  being  tested,  the  only  way  for  the  policy 
to  stay  in  U  forever  with  probability  greater  than  0  is  to  select  a  truth  assignment  that  satisfies 
simultaneously  all  the  clauses.  In  the  other  direction,  from  a  truth  assignment  that  satisfies  all 
clauses  we  can  immediately  derive  a  nrenroryless  partial-information  policy  that  never  leaves  U. 
Hence,  the  confinement  problem  has  an  affirmative  answer  iff  the  SAT  instance  is  satisfiable.  We 
note  that  this  proof  also  shows  the  NP-completeness  of  the  confinement  problem  for  general  partial- 
information  policies.  I 

Corollary  2  The  problems  of  pCTL  model  checking  and  of  the  computation  of  the  minimum 
long-run  average  outcome  are  NP-hard  for  memoryless  or  limit-memoryless  incomplete-information 
policies. 

The  proof  of  this  corollary  is  similar  to  the  proof  of  Corollary  1. 

5  Verification  under  Memoryless  Partial-Information 
Policies 

In  this  section,  we  show  how  the  minimum  long-run  average  outcome  under  memoryless  or  limit- 
memoryless  partial-information  policies  corresponds  to  the  solution  of  a  nonlinear  optimization 
problem.  Even  though  solving  this  problem  is  NP-hard,  as  shown  in  the  previous  section,  we  can 
use  techniques  for  the  approximate  solution  of  nonlinear  optimization  problems  to  obtain  upper 
bounds  for  the  minimum  long-run  average  outcome.  These  upper  bounds  can  be  used  in  the 
analysis  of  the  performance  of  the  system.  An  overview  of  techniques  for  the  solution  of  nonlinear 
optimization  problems  can  be  found  in  [Ber95a]. 

Restricting  the  attention  to  memoryless  or  limit-memoryless  partial-information  policies,  rather 
than  considering  general  ones,  is  often  not  a  drawback.  In  fact,  it  is  possible  to  model  as  part  of 
the  state  of  the  system  any  information  about  the  past  history  of  the  system  that  can  influence  the 
resolution  of  nondeterminism.  Additionally,  the  goal  of  partial  information  is  to  limit  the  power  of 
the  demonic  resolution  of  nondeterminism;  often,  the  further  limitation  of  lack  of  memory  is  quite 
natural  in  a  performance-evaluation  setting.  In  particular,  if  nondeterminism  is  used  to  model 
unknown  values  for  transition  probabilities,  rather  than  concurrency,  then  it  is  appropriate  to 
resolve  nondeterminism  in  a  memoryless  fashion.  Finally,  we  recall  that  under  perfect  information, 
there  are  always  worst-case  policies  for  pCTL  and  long-run  average  outcome  specifications  that 
are  memoryless.  Hence,  the  consideration  of  memoryless  policies  to  compute  the  worst  case  under 
partial  information  is  a  fairly  natural  extension. 
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Consider  an  MDP  V  =  (S ,  Acts ,  A,  p) ,  together  with  two  labelings  w  :  S  x  Acts  {0,1}  and 
r  :  S  x  Acts  e- >  ]R.  Assume  also  that  PropS(Jl)  ^  0  and  PropS (Ulmp)  7^  0-  The  minimum  long-run 
average  outcome  under  perfect-information  policies  can  be  computed  by  solving  the  following 
linear-programming  problem  [Ber95b,  dA98a] . 

LP  Problem  PI.  Set  of  variables:  {A}  U  {hs  \  s  G  S}. 

Maximize  A  subject  to: 

hs  <  r(s,  a)  —  A w(s,  a)  +  ^  p(s,  a)  (t)  ht  for  all  s  G  S  and  a  G  A(s)  I 

teS 

To  compute  ^niMP  and  u/j  ,  we  take  the  dual  of  the  above  linear-programming  problem,  and  we 
add  a  (nonlinear)  constraint  encoding  (3).  The  resulting  nonlinear-programming  problem  is  given 
below. 

Optimization  Problem  P2.  Set  of  variables:  |  s  G  S  A  a  G  ^4(s)}- 

Minimize  E  E  xs,aR(s,  a)  subject  to: 

s£S  a£A(s) 

xSA  >  0  for  all  s  G  S  and  a  G  A(s) 

E  E  xs,ap(s,a)(t)  =  E  Xt.b  for  all  t  G  S 

s€.S  aGA(s)  b£A(t) 

E  E  xs,aw(s,  a)  =  1 

sGS1  a£A(s) 

%s,a  ^  ^  %t,b  =  %t,a  E  xSj6  /or  all  s/  G  5  with  s  ~  t  and  all  a  G  A(s) 

beA(t)  b£A(s) 


(4) 

(5) 

(6) 
(7) 


The  meaning  of  this  optimization  problem  is  as  follows.  For  all  s  G  S'  and  a  G  A(s),  the  variables 
are  proportional  to  the  state-action  frequencies  defined  in  Section  4.2.  Equation  (4)  simply 
states  that  all  variables  are  positive.  Equation  (5)  is  a  flow  constraint,  requiring  that  for  every 
state,  the  frequency  of  entering  the  state  is  equal  to  the  frequency  of  leaving  it.  Equation  (6)  is 
a  normalization  constraint,  that  renormalizes  the  state-action  frequencies  so  that  the  (adjusted) 
frequency  of  completing  a  task  is  1.  Equation  (7)  encodes  directly  the  constraint  (3).  The  goal  of 
the  optimization  problem  is  to  minimize  the  outcome  received  per  unit  of  frequency.  Because  of 
(6),  this  is  equivalent  to  minimizing  the  outcome  per  task.  The  following  theorem  states  that  the 
above  optimization  problem  computes  the  desired  quantity. 

Theorem  3  The  solution  of  the  nonlinear  programming  problem  P2  is  equal  to  the  minimum 
long-run  average  outcome  under  memoryless  or  limit-memoryless  partial-information  policies. 
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6  Conclusions 

In  this  paper,  we  argued  that  the  accurate  estimation  of  worst-case  performance  properties  of 
systems  that  include  probabilistic  and  nondeterministic  choice  requires  the  consideration  of  partial- 
information  policies.  On  the  other  hand,  we  showed  that  even  for  nrenroryless  partial-information 
policies,  the  problem  of  computing  the  worst-case  long-run  average  outcome  is  NP-hard.  We  then 
presented  a  non-linear  optimization  problem  whose  solution  enables  the  computation  of  performance 
indices  of  a  system  under  partial-information  policies. 

These  results  point  to  some  future  directions  for  the  modeling  and  analysis  of  long-run  average 
properties  of  probabilistic  systems  (such  as  performance).  One  direction  consists  in  using  nonde¬ 
terminism  in  the  model  sparingly,  remembering  that  it  will  be  resolved  under  perfect  information, 
and  relying  on  a  manual  inspection  of  the  worst-case  scenarios  to  determine  their  plausibility.  A 
second  direction  of  research  consists  in  identifying  a  concept  that  captures  some  of  the  relevant 
features  of  partial-information  policies,  while  leading  to  polynomial-time  verification  algorithms. 
A  third  direction  consists  in  studying  the  system  under  memoryless  or  limit-memoryless  partial- 
information  policies,  and  in  devising  algorithms  that,  while  NP-complete  in  the  worst  case,  exhibit 
good  average-case  complexity  for  typical  system  models. 
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